Skip to content

feat(loop-detection): defer warning injection#2752

Draft
ggnnggez wants to merge 5 commits intobytedance:mainfrom
ggnnggez:fix/loop-detection-tool-call-pairing
Draft

feat(loop-detection): defer warning injection#2752
ggnnggez wants to merge 5 commits intobytedance:mainfrom
ggnnggez:fix/loop-detection-tool-call-pairing

Conversation

@ggnnggez
Copy link
Copy Markdown
Contributor

@ggnnggez ggnnggez commented May 6, 2026

Fixes #2733.

keeping this PR as draft until the 2.0-m1 patch is released.

Summary:

  • Defer loop warning injection from after_model to wrap_model_call/awrap_model_call so tool-call/tool-message pairing stays intact.
  • Queue loop warnings per thread and append them as HumanMessage(name="loop_warning") on the next model request.
  • Remove the temporary channel-side loop-warning display filter now that warnings no longer mutate AIMessage content.
  • Pending warnings are scoped by (thread_id, run_id); before_agent clears stale warnings from older runs, and after_agent clears undelivered warnings for the current run.
  • Multiple pending warnings in the same run are deduplicated and merged into one hidden HumanMessage(name="loop_warning").

Tests:

  • uv run pytest tests/test_loop_detection_middleware.py
  • uv run pytest tests/test_channels.py tests/test_loop_detection_middleware.py

@ggnnggez ggnnggez closed this May 6, 2026
@ggnnggez ggnnggez reopened this May 6, 2026
@ggnnggez ggnnggez marked this pull request as draft May 6, 2026 16:05
@ggnnggez ggnnggez marked this pull request as draft May 6, 2026 16:05
@M-Raquel
Copy link
Copy Markdown

M-Raquel commented May 7, 2026

I’m still pretty new to this part of the stack, so I appreciate how clearly the behavior is explained here. It helped clarify why injecting the warning earlier caused issues with the tool‑call pairing rules. One thing I’m curious about: could multiple queued warnings accumulate in edge cases, or does the drain logic ensure only one warning is ever surfaced per cycle? I’m trying to understand how this behaves under repeated loop conditions.

@ggnnggez
Copy link
Copy Markdown
Contributor Author

ggnnggez commented May 7, 2026

@M-Raquel
Thanks for the insightful comment. Short answer: after the latest updates, queued warnings are now scoped by both thread and run, and multiple pending warnings are merged before injection.

Under the normal agent loop, accumulation should still be uncommon. A warning is enqueued in after_model only after a repeated tool-call pattern is detected, then drained on the next wrap_model_call, after the corresponding tool responses have already been added. That preserves the provider-required AIMessage(tool_calls) -> ToolMessage pairing.

The important edge case is when a run stops before that next model call, for example interruption or failure during tool execution. The previous version could leave a thread-scoped warning behind and drain it into a later invocation for the same thread. The latest update scopes _pending_warnings by (thread_id, run_id), clears stale warnings for older runs in before_agent, and clears undelivered warnings for the current run in after_agent, so a warning from one run should not leak into another run.

If multiple warnings are pending inside the same run, they are deduplicated and merged into a single HumanMessage(name="loop_warning") appended at the end of the outgoing request. So the behavior is protocol-safe and now semantically scoped to the run as well.

@ggnnggez ggnnggez changed the title fix(loop-detection): defer warning injection feat(loop-detection): defer warning injection May 7, 2026
ggnnggez and others added 4 commits May 8, 2026 11:03
The warn branch in LoopDetectionMiddleware injected a HumanMessage
into state from after_model. The tools node had not yet produced
ToolMessage responses to the previous AIMessage(tool_calls=...), so
the new HumanMessage landed *between* the assistant's tool_calls and
their responses. OpenAI/Moonshot reject the next request with
"tool_call_ids did not have response messages" because their
validators require tool_calls to be followed immediately by tool
messages.

Detection now runs in after_model as before, but only enqueues the
warning into a per-thread list. Injection happens in wrap_model_call,
where every prior ToolMessage is already present in request.messages.
The warning is appended at the end as HumanMessage(name="loop_warning")
— pairing intact, AIMessage semantics untouched, no SystemMessage
issues for Anthropic.

Closes bytedance#2029, addresses bytedance#2255 bytedance#2293 bytedance#2304 bytedance#2511.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ggnnggez ggnnggez force-pushed the fix/loop-detection-tool-call-pairing branch 2 times, most recently from 04bcacd to 549c1a6 Compare May 8, 2026 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Teach-debt] Defer LoopdetectionMiddleware warn injection from after_model to wrap_model_call

2 participants